NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Accurate assembly of circular RNAs with TERRACE

https://doi.org/10.1101/gr.279106.124

Zahin, Tasfia; Shi, Qian; Zang, Xiaofei Carl; Shao, Mingfu (September 2024, Genome Research)

Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with their 5′ and 3′ ends covalently bonded. CircRNAs are known to be more stable than linear RNAs, have distinct properties and functions, and are promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that “bridge” the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much-improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown to be superior to using abundance for scoring. On both simulations and biological data sets, TERRACE consistently outperforms existing methods by a large margin in sensitivity while achieving better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%–413% more correct circRNAs than state-of-the-art methods. TERRACE presents a significant advance in assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in future research on circRNAs.
more » « less
Full Text Available
Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads

https://doi.org/10.4230/LIPIcs.WABI.2024.22

Zang, Xiaofei Carl; Li, Xiang; Metcalfe, Kyle; Ben-Yehezkel, Tuval; Kelley, Ryan; Shao, Mingfu (August 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pissis, Solon P; Sung, Wing-Kin (Ed.)
Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol. LoopSeq Solo also achieves ultra-high sequencing depth and high purity of short reads covering the entire captured molecule. Despite the availability of many assembly methods, constructing full-length sequence from these anchor-enabled, ultra-high coverage sequencing data remains challenging due to the complexity of the underlying assembly graphs and the lack of specific algorithms leveraging anchors. We present Anchorage, a novel assembler that performs anchor-guided assembly for ultra-high-depth sequencing data. Anchorage starts with a kmer-based approach for precise estimation of molecule lengths. It then formulates the assembly problem as finding an optimal path that connects the two nodes determined by anchors in the underlying compact de Bruijn graph. The optimality is defined as maximizing the weight of the smallest node while matching the estimated sequence length. Anchorage uses a modified dynamic programming algorithm to efficiently find the optimal path. Through both simulations and real data, we show that Anchorage outperforms existing assembly methods, particularly in the presence of sequencing artifacts. Anchorage fills the gap in assembling anchor-enabled data. We anticipate its broad use as anchor-enabled sequencing technologies become prevalent. Anchorage is freely available at https://github.com/Shao-Group/anchorage; the scripts and documents that can reproduce all experiments in this manuscript are available at https://github.com/Shao-Group/anchorage-test.
more » « less
Full Text Available
Three-step one-way model in terahertz biomedical detection

https://doi.org/10.1186/s43074-021-00034-0

Peng, Yan; Huang, Jieli; Luo, Jie; Yang, Zhangfan; Wang, Liping; Wu, Xu; Zang, Xiaofei; Yu, Chen; Gu, Min; Hu, Qing; et al (December 2021, PhotoniX)

Abstract Terahertz technology has broad application prospects in biomedical detection. However, the mixed characteristics of actual samples make the terahertz spectrum complex and difficult to distinguish, and there is no practical terahertz detection method for clinical medicine. Here, we propose a three-step one-way terahertz model, presenting a detailed flow analysis of terahertz technology in the biomedical detection of renal fibrosis as an example: 1) biomarker determination: screening disease biomarkers and establishing the terahertz spectrum and concentration gradient; 2) mixture interference removal: clearing the interfering signals in the mixture for the biomarker in the animal model and evaluating and retaining the effective characteristic peaks; and 3) individual difference removal: excluding individual interference differences and confirming the final effective terahertz parameters in the human sample. The root mean square error of our model is three orders of magnitude lower than that of the gold standard, with profound implications for the rapid, accurate and early detection of diseases.
more » « less
Full Text Available

Search for: All records